Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
نویسندگان
چکیده
The CUDA model for GPUs presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods, and different data types. Identifying which options to use and when is a non-trivial exercise. This paper explores the effect of these different options on the performance of a routine that evaluates sparse matrix vector products across three different generations of NVIDIA GPU hardware. A process for analysing performance and selecting the subset of implementations that perform best is proposed. The potential for mapping sparse matrix attributes to optimal CUDA sparse matrix vector product implementation is discussed.
منابع مشابه
Sparse linear algebra on a GPU
We investigate what the graphics processing units (GPUs) have to offer compared to the central processing units (CPUs) when solving a sparse linear system of equations. This is performed by using a GPU to simulate fluid-flow in a porous medium. Flow-problems are discretized mainly by the mimetic finite element discretization, but also by a two-point fluxapproximation (TPFA) method. Both of thes...
متن کاملOptimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators
Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized n...
متن کاملOptimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical k...
متن کاملThe Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
Existing formats for SparseMatrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse m...
متن کاملA Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs
Sparse Matrix is a matrix consisting of very few non-zero entries. Large sparse matrices are often used in engineering and scientific operations. Especially sparse-matrix vector multiplication is an important operation for solving linear system and partial differential equations. However, there is a possibility that even though the matrix is partitioned and stored appropriately, the performance...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 24 شماره
صفحات -
تاریخ انتشار 2012